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An  Objective  Method  for  Forecasting  Solar  Flares 


1.  IM'RODl  (vno^ 

This  report  is  a  continuation  of  an  earlier  study  (Hirman  et  al,  1980)  in  which 
multivariate  discriminant  analysis  (MV’DA)  is  used  in  a  computer  program  to  pro¬ 
duce  an  objective  daily  solar  flare  forecast.  The  essential  feature  of  the  statistics 
package  is  the  comparison  between  a  number  of  input  parameters  and  a  number  of 
output  classes,  in  which  the  discrimination  between  the  classes  in  terms  of  the 
input  parameters  is  maximized  by  constructing  appropriate  classification  functions. 
In  the  application  to  flare  prediction,  the  input  parameters  are  daily  solar  param¬ 
eters  for  each  active  region  on  the  solar  disk,  and  the  output  classes  are  the  levels 
of  flare  activity  occurring  the  following  day  within  the  same  active  regions.  We 
have  used  more  than  two  years  of  data,  of  which  approximately  'io  percent  has 
been  used  to  derive  the  classification  functions.  The  latter  are  then  extrapolated 
forward  in  time  to  produce  a  true  forecast. 

The  computer  program,  known  as  HMD07M,  was  originally  written  at  TCLA,  ^ 
although  the  particular  version  used  here  was  developed  further  by  Seagraves“  to 
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UK'Iudp  tho  ^.■o^)lpy  ain.1  l.dhiips  classificatiiin  |ii’ocp(lurp,  ’  ami  t!ip  I  .aciipnhi’upii 
N-l  tociii'aiiuo.  ^  riip  CopIpv  anil  l.ohnPs  pnippilurp  dop.s  nm  a.-;.-,uniP  unifurmiti 
I'f  variaiipp,  and  this  si'niptinips  rpsults  in  liPttPr  cla;  .-.ifipation  acoi’p.-.  TIip 
computational  liurdpu,  iunvpvpr,  is  incrp.ispd  bocau.-p  Iiiipm'  <■ !  i -isificat  ion 
functions  at-p  not  possibip;  instpail,  canonical  variablp.^,  ci ■n.-t '•act nd  from  the 
oriitinal  input  |iaramptprs,  arc  usptl  as  a  ti'ansformation  t  ■  rcducp  the  niatri:-. 
dimension  in  tlio  classification  formulas.  I'liP  I  .acliptibrucli  toclmniup  i-pmo\Ps 
bias  when  the  proci'am  classifios  its  uun  liata  hasp. 

A  compipfp  dpscription  of  tlip  mattipniat ics  is  bpvond  tho  scope  of  this  rpport. 
i'tip  rpadpr  mav  consult  \ndprson  anil  Kao"  for  rpfprencps  on  aisci  iminant 
analysis.  A  discussion  of  the  suitability  of  applying  various  si.,tistical  n-Pthods  to 
discrptp  input  vai'iabli's  is  contaiiiPd  in  Vpcchia  Pt  al.  '  I'Iip  lattpr  point  is  of 
particular  intPi'pst  iiocausp  tliP  work  of  \'pcchia  Pt  al  usps  thp  samp  iliscrPtP  data 
base  as  uspii  liprpin,  to  proilucp  solar  flai-p  prcbaliility  forpcasts  usins?  discrimin¬ 
ant  analvsts  (.vithout  the  foolcy  and  l.ohnps  procpdurpi  am!  louistic  rpyression 
atiaK  SIS. 

\n  important  fpatut'p  of  thp  pi'PSPnt  study  is  thp  comparison  of  tliP  objpctivp. 
coniputpr  forecast  with  a  subjective,  conventional  forecast  m-epared  durinu  the 
samp  test  period  for  the  same  active  recfions  on  the  sun.  Witliout  such  a  bench¬ 
mark  for  reUuivp  evaluation,  the  prpspntation  of  anv  forecast  method  has  con- 
siderai'iv  I’piluci'd  merit. 

l)\l  \ 

The  data  used  herein  were  obtained  from  the  region  analysis  program  at  the 
\th\A  Space  Environment  Services  Center  (SESC)  in  Koulder,  Colorado.  The 
region  analysis  program  collects  daily  a  variety  of  solar  parameters  for  each 
active  region  on  the  solar  disk.  It  is  important  to  note  that  there  is  no  attempt 
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in  this  program  to  select  the  more  flare-productive  regions.  The  parameters 
include  radio  anti  \-i-ay  data,  hut  most  ai'e  derived  from  optical  data  supplied 
by  the  I  SA iV AW S  SODN  system.  The  parameters  contain  information  which  the 
SESC  forecasters  consitler  vital  to  the  preparation  of  a  21 -hour  flare  forecast, 
■file  present  study  uses  data  for  the  periotl  1  .lanuary  1977  to  ,'il  .lanuary  1979, 
containing  fiOPfi  active-region  days  (records)  that  have  been  checked  for  errors 
and  internal  consistency.  Handom  scrutiny,  however,  lias  shown  tliat  errors 
still  remain.  After  several  reassignments  of  parameter  values  and  definitions, 
we  arrived  at  the  form  of  the  data  base  shown  in  Table  1, 


Table  1.  SESC  Region  Analysis  Parameters  (Modified) 


PARAMETER 


ASSIGNED  VALVE 


1.  DATE 

2.  REGION  NVMRER 

3.  REGION'S  VIRST  APPEARANCE  LONGITVDE 

4.  CURRENT  LONGITUDE 
fi.  N/S  LATITUDE 

6.  CURRENT  LATITUDE 

7.  CARRINGTON  LONGITUDE 

8.  REGION  AGE 

9.  SPOT  CI.ASS  1 


10.  SPOT  Cl, ASS  :> 


r/x .  1 


a .  ,j 

h .  4 

k . r> 


11.  SPOT  CLASS  3 


X .  1 

o .  2 

i  .  3 

c  . . 4 

13.  -MAGNETIC  CLASS 

No  spotij . 0 

Alplia .  1 

15('ta . . . . . . .  2 

lipta-Gamma . . .  3 

Gamma .  4 

Beta-Delta .  5 

Beta-Gamma-Delta .  6 

Gamma-Delta .  7 

13.  MAGNETIC  POLARITY  OF  STRONGEST  FIELD .  (+/-) 

14.  MAGNETIC  FIELD  STRENGTH .  (Gauss) 

15.  MAGNETIC  GRADIENTS .  (Gamma/km) 

16.  INTERACTION  WITH  ANOTHER  REGION 

None  . . . .  0 

Spots  of  opposite  polarity  converge  (from  less 
than  two  degrees  apart) .  1 

17.  SUNSPOT  DYNAMICS 

No  spots  or  no  motion .  0 

Coalescing  of  spots .  1 


Table  1.  SESC’  Hegion  Analy.sis  Parameters  (Modified)  (continueii) 


Spot  rotation .  L’ 

llelativo  motion  between  oppositely  poled  spots . 

18.  STAGE  GE  DEVEI.OPMEXT 

Xo  spots . 0 

Mature  group  (stable)  . . 1 

Decaying . 2 

Growing . 8 

Rapid  decay  (spot  numbers/areas  decrease  liy  ■'  aO ’' )  . .  4 

Rapid  growth  (spot  numbers/areas  increase  by  >  .80'") .  .8 

Rapid  growtli  (•>  100  "' ) . 0 

19.  LEADER/THAILER  FIEI.DS 

Structure  not  definite . 0 

Returning  Region  . .  1 

<,8  deg  of  neutral  line  and  out  of  phase . .  2 

>8  deg  of  neutral  line  and  in  leader  fields .  3 


>8  deg  of  neutral  line  and  in  trailer  fields 
<8  deg  of  neutral  line  and  in-phase . 

20,  HETIRXTNG  REGION 

21.  SECTOR  IIOI  NDARY  RELATRINSHIP 

22.  ASSOCLATED  EII.AMENT 

None . . . 

Filament  unchanged  . . 

Filament  growing  . . . 

Filament  disappeared  within  past  24  hrs  . 
Filament  darkens  or  is  active . . 

23,  EMBEDDED  FILAMENT 

None . . . 

Filament  present  . . 


Active  filament 


I'ablo  1.  SESC’  Hogion  Analysis  Paraniotf’rs  (Modified)  (continued) 


Table  1.  SESC  Region  Analysis  Parameters  (Modified)  (continued) 


31.  ISOLATED  POLE 

32.  EMERGING  FLI  X 

None,  or  region  is  new . . 

New  flux  emerges  within  spot  group . 

New  flux  emerges  near  region  (within  deg) . 

33.  ARCH  FILAMENT  SYSTEM 

34.  RADIO  BLRST/SWEEP 

None  occurred  . . 

>250  flux  units  at  10  cm . . 

>1000  flux  units  at  10  cm . 

Type  III . 

Type  I\' . 

Type  II  and  IV... . . . 

F  Hurst . . . . . 

.Major/complex  10  cm  burst  . . 

>1000  flux  units  at  10  cm  plus  a  U  burst,  or 
Type  III  and  H’,  or 

250  flux  units  at  10  cm  plus  Type  III  and  IV  . . . 

35.  REGION'S  FIRST  APPEARANCE  (TRANSIT  HISTORY) 

36.  FLARE  HISTORY 

No  flares  have  occurred . 

C  class  flares  have  occurred  . . . 

M  class  flares  have  occurred . 

N  class  flares  have  occurred  . . . . 

37.  FLARES  TODAY 


0 

1 


0 

1 

2 

3 

4 

5 

6 
7 


8 


0 

1 

2 

3 


None 


Most  ot'  ttie  paraniotprs  in  I'aDin  1  liavo  opon  a.'?iic;nr'(i  uiscretP  valur'.- 
accorciinp  to  cato"oino.s  which  arc  suiiicctivciv  related  to  incrca.sina  llarc 
activitv.  i’hisj  subjectivity  is  the  weakest  link  in  an\  sciieme  utilizing  fitnective 
procedures  tor  prouucinc  a  iorecast  soli'lv  I'roni  data,  in  essence,  t!ie  situa¬ 
tion  mereiv  allows  the  element  of  subjectivity  to  reside  entirely  in  the  data 
acquisition  process,  i^ro’bably,  this  situation  is  preferable  to  iiavins?  sulnecti- 
vity  introauced  also  in  the  forecast  preparation.  'I'here  are  sei  eral  parameters 
(e.fZ.  spot  class,  flare  history,  magnetic  class)  for  which  assigned  values  are 
based  upon  quantitative  studies,  l-'ortunatelv,  (or  perhaps  therefore;  I  these  param 
eters  are  among  those  from  which  the  obiective  forecast  derives  most  of  its  skill. 

Perhaps  the  most  unfortunate  circumstance  is  that  for  a  large  number  of 
records  one  or  more  parameters  is  missing.  In  the  computer  program,  mis¬ 
sing  data  codes  are  replaced  by  averages  for  the  particular  parameter  in  the 
set  of  records  used  in  deriving  the  classification  functions.  Missing  data,  in 
addition  to  errors,  makes  the  testing  of  objective  techniques  difficult,  espec¬ 
ially  for  determining  the  relative  significance  of  various  parameters.  In 
order  to  portray  some  feeling  for  the  degree  of  representation  in  the  data  base 
we  note  the  following:  for  three  commonly  observed  parameters.  Spot  Plass 
2,  "Magnetic  Class,  and  'Flares  Today,  "  only  5893  of  the  total  509.')  records 
contain  all  three;  if  "Bright  Points,  "  "Spot  Class  3,  "  "Spot  Class  "  "Magnetic 
Gradients,  "’  and  "Sunspot  Dynamics"  are  added  to  the  first  three,  only  3732 
records  remain;  and  for  a  total  of  15  of  the  31  usable  parameters,  only  510 


Npvprtholpsri,  wo  are  ablp  to  show  lator  that  at  least  soiiif  of  those  frequently 
niissintj  parameters  contain  valuable  predictive  information. 

The  data  base  contains  daily,  region-by-retfion  entries  for  the  actual  flare 
activity,  in  addition  to  the  official  SESf  subjectively  derived  flare  forecast, 
riius,  the  information  required  for  objectiv'e  forecast  testing,  as  well  as  for 
comparison  with  the  SESC'  forecast,  is  contained  in  the  same  base,  I'lares  are 
listed  according  to  their  peak  soft  (1-11  ?,)  X-ray  flux  at  1  Al  ; 
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Erom  the  standpoint  of  geophysical  environment  studies,  the  classes  M  and  \ 
are  of  greatest  importance. 

In  addition  to  Table  1,  six  combination  parameters  (Table  2),  derived  from 
certain  original  parameters,  were  included  as  input  parameters.  These  six  were 


Table  2.  Combination  Parameters  (Numbers  in  right-hand 
column  refer  to  original  parameter  number  in  Table  1) 


New  Parameter  No. 

Parameter  Formula 

1 

9*  10*  11 

2 

9*  10*  11*  12 

3 

9-  10.  11.32 

4 

14.  15.  (17+25) 

5 

12.  17.  27 

6 

17.  (25+27+28) 

found  to  have  possible  predictive  significance  in  the  earlier  study  where  twenty 

g 

such  combination  parameters  were  tested.  The  derivation  of  combination  param¬ 
eters  is  based  on  intuitions  about  the  form  in  which  predictive  information  might 
be  contained  in  the  data,  and  about  physical  quantities  (e.  g.,  energy  stored  in 
sheared  magnetic  fields)  presumed  relatable  to  flares.  The  subject  of  these  and 
other  combination  parameters  will  be  discussed  in  a  later  section. 


8.  Hirman,  J.  W.,  Neidig,  D.  F, ,  Seagraves,  P.  H.,  Flowers,  W.  E.,  and 

Wiborg,  P.  H.  (1980)  in  Sol. -Terrest,  Pred.  Proc.,  Vol.  3,  R.F.  Donnell3 
(ed.),  C-64.  “ 
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riio  roi^ion  analysis  paramotpr.s  for  today  are  independent  of  any  information 
on  Haro  activity  occurrinp  tomorrow;  therefore,  they  can  be  used  in  practice, 
today,  to  produce  a  flare  forecast  for  tomorrow,  assuminft  that  predictive  informa¬ 
tion  is  present  in  the  parameters.  We  have  used  the  first  \  records  (with  N  =  1500, 
as  described  lielow)  as  a  training  set"  in  order  to  derive  the  classification  func¬ 
tions  for  three  (possible  outcomes;  "Xo  I'lare,  '  "f  (■'Inre,  "  and  M  or  X  (■'lare.  ' 

M  and  \  Hares  were  grouped  together  as  a  single  class  in  order  to  reduce  statisti¬ 
cal  noise  caused  l)y  tiie  relatively  few  cases  of  larger  flares.  The  classification 
functions  were  then  applied  to  new  records,  using  only  the  input  parameters,  in 
order  to  proiluce  a  true  forecast.  The  latter  procedure  was  accomplished  in 
steps  of  250  records  each,  with  the  training  set  sliding  forward  in  time,  2511 
records  (aiijiroximately  one  month)  after  eacii  step.  I’hus,  for  a  IsOO-record 
training  set,  the  remaining  ■! .>95-record  test  set  require  s  10  individual  suinests 
of  250  records  each  (except  for  tlie  nineteenth).  Tliis  slidir.g  iiase  techniciue  main- 
t;tins  a  constant  N  records  in  tlie  training  set,  thereby  assuring  that  the  prograir, 
is  trained  on  recent  data  relative  to  the  test  subset.  This,  combined  with  the 
relatively  small  size  of  the  test  subset,  minimizes  the  effects  of  secular  trends, 
either  of  observational  or  solar  origin,  wliich  rniglit  lie  present  in  the  data. 

r.he  computer  pi'Ogram  was  trainee!  on  the  X-ray  class  of  the  largest  event 
(Xo  I'lare,  f  I''lare,  or  M  is  \  l-'lare)  occurring  in  the  region  in  the  24 -hour 
period  following  the  acquisition  date  of  the  input  parameters.  I'hus,  the  computer 
forecast  is  expressed  in  terms  of  prol^aljilities  for  the  largest  event  to  be  in  one 
of  these  classes.  The  outcomes  are  mutually  exclusive,  with  the  sum  of  probabili¬ 
ties  over  all  classes  equal  to  unity.  The  dESC  forecast,  )iowever,  is  a  probability 
forecast  for  the  occurrence  of  each  class  of  event;  i.e.,  a  non-exclusive  format. 

In  order  to  assess  the  quality  of  the  computer  forecast,  we  derived  a  comparison 
forecast  in  the  ’  exclusive"  format  liy  selecting  the  largest  event  class  in  the 
SESl'  forecast  that  was  assigned  a  probability  greater  than  or  equal  to  0.  5.  Al- 
thougli  this  is  not  an  SESC  forecast,  it  is  probably  representative  of  what  would 
be  extant  if  the  SESC  chose  to  cast  their  predictions  in  this  mode. 

In  the  following  test  results  we  present  the  forecasts  according  to  both  the 
standard  multivariate  discriminant  analysis  (MVDA)  and  the  Cooley  and  l.ohnes 
procedure  (MVDA/CL).  There  are  important  differences  in  the  character  of 
these  two  forecasts,  which,  as  will  be  shown  later,  may  be  used  to  advantage. 
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1.  TKST  KK.Sl  I.TS 


l.l  Pri‘liiiiiiiai'\  Disrussitiii 

U>  aro  concerned  mainly  with  the  behavior  of  the  computer  forecast.'>  l  ela- 
tive  to  the  comparison  forecast  when,  for  example,  changes  are  made  in  the 
size  of  tlie  trainina;  set,  choice  of  input  parameters,  solar  activity  levels,  and 
percent  of  missiiif;  data.  In  all  cases  ue  present  the  computer  forecast  alonp' 
with  the  comparison  forecast  for  the  same  set  of  test  records.  Also  incluiied  is 
a  list  of  input  parameters  submitted  to  analysis,  alonit  with  their  freuuency  of 
selection  in  classifyini;  the  three  outcomes.  Note,  Imwever,  tiiat  due  ti'  the  daO- 
record  incri'ii'.ent  the  trainim;  sets  are  independent  of  each  olhf'r  only  when  sepa¬ 
rated  by  six  or  more  siiljsets. 

\s  a  first  st('p,  we  eliminated  11  [larameters  which  wei’e  not  selected  in  am 
of  the  1;'  siihset,-.  I  'ol  lowimr  tliis,  ttie  program  was  run  ayain  usiiic  the  remain- 
iiiii  dd  input  parameters.  The  results  are  riven  in  I'aldes  s,  -4,  and  a.  This 
test  ( \)  will  .-erve  as  an  example  for  the  display.-  used  elsewhere  m  tin.-  ri'port. 

fable  -I  -how.-  tlie  actual  matrix  of  i-eaion-day  forecast.-  v,-  cion-day 
larre-t  events,  fim  the  three  forecasts,  fable  a,  derived  from  the  data  in  fable 
1,  .uimmarize-  the  fid iowiiiij;: 

f  I'ercent  of  forecasts  correct  in  the  ^iven  e\  ent  clas.- 
F  I'ercent  of  region-day  largest  events  which  were  forecasted 

\  (l'SK)/j 

I'  I'limatology  (percent  of  the  total  number  of  events  in  the  class) 

I  i  nweighted  mean  of  the  .A's  for  all  three-event  classes 
\\  Weighted  mean  fi'recast  accuracy  (the  sum  of  the  matrix 

di, agonal  elements  divided  by  the  total  number  of  forecasts, 
or  ments,  in  all  classes) 

( 'ff  1  Percent  of  forecasts  that  are  one  matrix  element  away  from 
the  diaiioiial 

•.Iff  d  Percent  of  forecasts  that  are  two  matrix  elements  away  from 
the  diagonal 

fhese  vaidoiis  scores  are  of  interest  because  of  the  several  ways  in  which 
forecasts  can  be  used,  for  example,  the  !•'  scoi'p,  or  pei-centage  of  foreca.sts 
that  are  correct,  is  the  quantity  of  interest  to  a  customer  who  cannot  tolerate 
false  alarms.  A  quite  different  reiiuirement  at>plies,  Iwwever,  in  a  situation 
where  .surprise  flares  are  unwelcome.  In  the  latter  case,  the  Fi  score  is  the 
important  measurement.  Of  course,  knowing  the  customer's  need  in  advance 
allows  the  forecast  to  be  biased  either  toward  underpredict  ion,  which  tends  to 
improve  the  p  score,  or  toward  overfirediction,  which  im]iroves  the  I)  score. 
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Tnblp  3.  Comparison  of  l•'orpcasts--Tpst  A 
(1500-rpcord  traininp  SPt) 


Largest  Event 

I  .argest 

Event  Observed 

Total 

Forecasted 

No  Flare  C 

AMX 

Forecasts 

CCMl’AlilSON 

No  Flare 

3370 

213 

20 

3614 

C 

001 

199 

09 

7  09 

MNX 

77 

82 

03 

2  22 

Total  Events 

3904 

494 

147 

4  090 

MVDA 

No  Flare 

3349 

190 

26 

3  560 

(' 

013 

200 

01 

770 

92 

98 

70 

260 

Total  Events 

3904 

494 

147 

4  090 

MVUA/CI. 

No  Flare 

3739 

310 

00 

4100 

C 

180 

142 

00 

377 

M^X 

30 

36 

47 

113 

Total  Events 

3904 

494 

147 

4  090 

As  a  moasure  of  tho  "balanced”  accuracy  of  a  forecast  in  a  given  event  class 
we,  therefore,  introduce  the  average  of  C  and  E,  given  by  A. 

The  accuracy  of  a  forecast  is  always  dependent  upon  the  climatology  for 
the  event  being  forecasted.  Higher  climatological  probabilities  tend  to  improve 
the  chances  for  predictions  to  be  correct.  Kor  example,  it  is  easy  to  predict 
"No  Flare  "  with  90  percent  accuracy,  simply  because  no  flare  occurs  in  almost 
90  percent  of  all  active-region  days.  In  comparing  cumulative  scores  between 
forecasts  it  is  imperative  to  note  the  climatology  which  prevailed  during  the 
test  period.  Climatology  is  affected  by  a  number  of  factors,  including  event 
classification  criteria,  duration  of  forecast  interval,  and  level  of  solar  activity. 
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Table  4.  Parameters  Submitted  to  Analysis  and 
Their  Frequency  of  Selection  in  19  Subsets--Test  A 


Flares  Today 

19 

New  No.  1 

9 

Mag.  Pol. 

5 

Bright  Points 

19 

Mag.  Grad. 

9 

Neut.  I..  Chg. 

5 

New  No.  2 

17 

Mag.  Class 

G 

Spot  Class  9 

1 

Spot  Dynam. 

12 

Radio  B/S 

fi 

Spot  Class  2 

2 

New  No.  a 

12 

Flare  Hist. 

G 

Spot  Inter. 

2  1 

Proton  Hist. 

1 1 

New  No.  .1 

n 

Emerg.  Flux 

1 

1  ! 

Spot  Class  1 

<1 

New  No.  4 

6 

j 

Table  a.  Comparison  of  Forecast  Scores-- Test  A 


f-’orooa.stor 

. 

Event 

F 

E 

A 

C 

I 

\\ 

(  'I'f  1 

(  'ff  2 

COMPAHISCN 

No  Flare 

03.4 

85.  1 

89.4 

86.  1 

C 

2G.  2 

40.3 

33.  3 

10.  8 

52.8 

79.  2 

18.  6 

2.  2 

MAN 

28.  1 

42.  9 

35.  G 

3.2 

MVIM 

No  I'lare 

03.  9 

84.7 

89.  3 

86.  1 

(' 

2G.8 

41.7 

34.  3 

10.8 

53.  6 

78.  9 

18.  4 

2.  G 

MN  \ 

2G.  9 

47.  G 

37.3 

2 

MVU  \/Cl, 

No  Flare 

91.  1 

94.6 

92.8 

86.  1 

C 

37.7 

28.7 

33.2 

10.8 

54.  3 

85.  5 

12.8 

1.7 

MN\ 

41.  G 

32.0 

3G.  8 

3.  2 

17 


Ill  essence,  cliniatologj’  is  directly  dependent  upon  "bin  size,  "  I'ailure  to  state 

climatological  conditions  clearly  (an  unfortunately  common  practice)  makes 

g 

intercomparison  of  forecasts  almost  impossible.  It  seems  that  this  point  can¬ 
not  be  emphasized  enough. 

Ilecause  "N'o  Flare"  constitutes  the  majority  of  situations  on  the  sun,  it  comes 
as  no  surprise  that  solar  flare  forecasts  are  usually  quite  accurate  overall;  i.e, , 
their  weighted  means  (\V)  are  high.  It  is  of  greater  interest,  however,  to  predict 
flares  than  quiet  conditions  and,  for  this  reason,  the  unweighted  score  1  ,  giv'en 
simply  by  the  mean  of  the  A  scores  over  all  classes,  has  been  included  in  Table  fi. 

Finally,  we  note  that  if  a  forecast  is  in  error,  it  is  better  to  be  wrong  by  one 
event  class  than  by  two.  Thus,  the  tendency  for  the  off-diagonal  entries  in  the 
matrix  to  cluster  near  the  diagonal  is  an  important  measure  when  comparing  fore¬ 
cast  scores  which  are  similar  otherwise.  Table  a  includes  a  measure  of  this 
error  distribution  in  the  form  of  the  Off  1  and  Off  2  scores. 

The  scores  (F,  E,  and  A)  have  uncertainties  of  approximately  ±1,  ±3,  and 
±5  for  No  Flare,  C  Flare,  and  M  &  X  Flare,  respectively.  The  F  and  \V  scores 
have  uncertainties  of  about  il.  Thus,  in  terms  of  A  and  F,  the  three  forecasts 
in  Table  5  are  essentially  identical.  The  MVDA/CL  forecast  definitely  excels  in 
the  \V  score,  although  this  is  mainly  due  to  its  tendency  for  underprediction,  which 
places  a  large  number  of  forecasts  in  the  No  Flare  column.  The  tendency  for 
underprediction  in  the  MVDA/CL  forecast  is  evident  also  in  the  F  scores  for  C, 
and  M  &  X  flares,  being  significantly  higher  than  the  corresponding  E  scores. 

On  the  other  hand,  both  the  comparison  and  the  MVDA  forecast  are  biased  toward 
overprediction.  Their  overall  similarity  is  quite  striking. 

4.2  Effect  of  Training  Set  Size 

The  number  of  records  to  be  used  in  the  training  set  should  be  large  enough 
to  provide  sufficient  statistics  to  train  the  computer  program,  yet  small  enough 
to  avoid  the  effects  of  trends  in  the  data.  The  optimum  number,  while  not  known 
from  theory,  may  be  determined  empirically  by  varying  the  training  set  size  and 
comparing  the  scores  of  the  resulting  forecasts.  Table  6  shows  the  results  for 
training  sets  of  750  and  2095  records.  Together  with  Table  5  {1500-record  train¬ 
ing  set)  we  find  differences  of  only  small  significance.  A  close  examination  of 


9.  Simon,  P, ,  Smith,  J.  B. ,  Ding,  Y.,  Flowers,  W. ,  Guo,  Q. ,  Harvey, 

K.  L. ,  Hedeman,  R.,  Martin,  S.  F. ,  McKenna  Lawlor,  S. ,  Lin,  V., 
Neidig,  D. ,  Obridko,  V.  N.,  Dodson  Prince,  H. ,  Rust,  D. ,  Speich,  D. , 
Starr,  A.,  and  Stepanyan,  N.  N.  (1980)  in  Sol. -Terres,  Pred.  Proc., 
Vol.  2,  R.  F.  Donnelly  (ed.),  p.  287. 
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Table  6.  Comparison  of  Scores  I'sing  TaO-Uecord 
and  2095-Uecord  Training  Sets--Test  15  and  C 


Forecaster 

Event 

F 

E 

A 

F 

V\ 

t)ff  1 

Off  2 

MVDA 

-No  Flare 

93.9 

86.0 

89.9 

750  Records 

C 

28.4 

41.3 

34.  8 

53.  3 

80.0 

17.0 

3.0 

M  X 

25.  2 

45.  1 

35.  1 

mvda/cl 

No  Flare 

91.  7 

93.  7 

92.7 

750  Records 

C 

36.  2 

32.7 

34.4 

52.  8 

85.  1 

13.  1 

1.8 

M  &  X 

36.3 

26.0 

31.  2 

.MVDA 

No  Flare 

93.  9 

84.0 

89.0 

2095  Records 

C 

25.  9 

42.8 

34.4 

53.  6 

78.  4 

19.4 

2.  2 

M  X 

28.  6 

46.4 

37.  5 

MVDA/Ct. 

No  Flare 

91.0 

95.0 

93.0 

2095  Records 

C 

38.0 

29.4 

33.7 

54.  3 

85.  8 

12.8 

1.4 

M  X 

45.8 

26.4 

36.  1 

the  trend  in  the  various  scores,  however,  suggests  that  there  may  be  some 
improvement,  especially  in  the  MVDA/CL  forecast,  as  the  size  of  the  training 
set  is  increased  from  750  to  1500  records.  The  improvement  is  less  certain  in 
increasing  the  set  from  1500  to  2095.  According  to  motivations  which  will  be 
described  later,  the  E  score  is  of  interest  in  the  case  of  the  MVDA  forecast, 
while  the  F  score  is  of  prime  importance  for  MVDA/CL,  Noting  these,  the  C 
scores,  and  the  fact  that  we  do  not  wish  to  make  the  training  set  unnecessarily 
large,  we  have  decided  to  use  1500  records  in  all  training  sets. 

4.3  inrldsion  of  Additional  Combination  Parameters 

Table  4  indicates  that  five  of  the  six  combination  parameters  from  Table  2 
were  retained  for  analysis  after  the  initial  parameter  selection.  Because  several 
of  these  ranked  highly  in  frequency  of  selection  in  Test  A,  we  decided  to  test 
additional  combination  parameters.  As  in  the  case  of  the  original  six,  the  addi¬ 
tional  parameters  were  derived  on  the  basis  of  intuition.  Their  formulas  are 
given  in  Table  7. 

The  20  new  combination  parameters,  in  addition  to  the  20  parameters  used 
in  Test  A,  were  submitted  to  analysis  in  Test  D  (Tables  8  and  9).  It  is  convenient 
to  defer  the  discussion  of  the  latter  to  the  following  section. 


Table  7.  Additional  Combination  Parameter.-^  (Numbers 
in  right-hand  column  refer  to  original  parameter  numbers 
in  Table  1) 


New  Parameter  No. 

Parameter  Formula 

Rates  of  Change 

7 

29  (today)  -  29  (yesterday) 

8 

37  -  37 

9 

(9-  10*  11-  12)  -  (9-  10-  11-  12) 

10 

17  -  17 

11 

(12-  17-27)  -  (12-  17-27) 

12 

38  -  38 

18 

9-9 

14 

(9-  10-11)  -  (9-  10-11) 

15 

15-15 

16 

12  -  12 

Parameters  Squared 

17 

29^ 

18 

372 

19 

(9-  10-  11-  12)2 

20 

172 

21 

92 

22 

(New  7)2 

23 

(New  8)2 

24 

(New  9)2 

25 

(New  10)2 

26 

(New  13)2 

50 


Table  8.  Parameters  Submitted  to  Analysis  and  Their  I'requency 
of  Selection  in  19  Subsets--Tests  D,  E,  K,  G,  and  H 


No.  of 

Test 

Parameters 

D 

40 

Flares  Today 

19 

Radio  B/S 

6 

Flare  Hist, 

2 

New  18 

17 

New  9 

6 

New  4 

2 

New  2 

16 

New  12 

6 

New  23 

2 

Bright  Pts. 

14 

New  1 

5 

Spot  CTass  2 

1 

New  19 

12 

Neut.  L.  Chg, 

5 

Emerg,  Flux 

1 

New  1 5 

10 

New  5 

5 

New  17 

1 

Mag.  Grad, 

9 

New  14 

5 

New  19 

1 

Proton  Hist. 

9 

New  21 

5 

Spot  Class  1 

0 

New  3 

9 

New  22 

5 

New  11 

0 

New  8 

9 

Mag.  Pol. 

4 

New  13 

0 

New  20 

8 

New  7 

4 

New  16 

0 

Mag.  Class 

7 

Spot  Inter. 

3 

New  26 

0 

New  10 

7 

New  25 

3 

Spot  Class  3 

6 

New  20 

2 

B 

20 

See  Table  4 

j 

E 

15 

Flares  Today 

19 

Spot  Class  3 

14 

Radio  B/S 

6 

Bright  Pts. 

19 

Spot  Dynam, 

13 

Spot  Class  1 

5 

Mag.  Class 

16 

Proton  Hist. 

11 

Nlag.  Pol, 

r) 

Mag.  Grad. 

16 

Flare  Hist. 

10 

Emerg.  Flux 

4 

Spot  Class  2 

14 

Neut.  L.Chg. 

6 

Spot  Inter. 

3 

F 

8 

Flares  Today 

19 

Spot  Class  2 

16 

Spot  Dynam. 

11 

Bright  Pts. 

19 

Spot  Class  3 

14 

Spot  Class  1 

5 

Mag.  Class 

17 

Mag.  Grad, 

13 

G 

5 

Flares  Today 

19 

Mag.  Class 

18 

Spot  Class  3 

15 

Bright  Pts. 

19 

Spot  Class  2 

16 

H 

3 

Flares  Today 

19 

Mag.  Class 

19 

Spot  Class  2 

19 

21 


Table  9.  Effects  of  Reduction  in  the  Number  of  Input  Parameters 


Forecaster 

Number  of 

Parameters 

U 

W 

Off  1 

Off  2 

R 

COMPARISON 

52.8 

79.2 

18.  6 

2,  2 

2,  22 

TEST  D 

MVDA 

40 

53.8 

79.  5 

18.4 

2.  1 

2.  38 

MVDA/ CL 

54.6 

85.  1 

13.4 

1.  5 

0.73 

TEST  A 

MVDA 

20 

53.6 

78.  9 

18.4 

2.  6 

2.  63 

MVDA/ CL 

54.3 

85.  5 

12.8 

1.7 

0.60 

TEST  E 

MVDA 

15 

52.4 

78.  1 

18.  7 

3.  2 

2.80 

MVDA/ CL 

53.9 

85.4 

12.8 

1.  8 

0.65 

TEST  F 

MVDA 

8 

52.2 

78.0 

18.6 

3.4 

2.83 

MVDA/ CL 

53.  3 

85.0 

13.4 

1.6 

0.71 

TEST  G 

MVDA 

5 

53.0 

77.6 

18.  5 

3.  9 

3.02 

MVDA/CL 

53.7 

84.4 

14.0 

1.7 

0.83 

TEST  H 

MVDA 

3 

51.  5 

78.2 

17.  5 

4.3 

2.  72 

MVDA/CL 

53.  5 

85.2 

13.  3 

1.  5 

0.61 

4.4  Redurtiun  in  the  Number  uf  Parameters 

The  computer  forecast  was  subjected  to  a  series  of  reductions  (Tests 

E,  F, 

G,  and  H)  in  the  number  of  input  parameters,  according  to  Table  8,  with  the 
corresponding  forecast  results  summarized  in  Table  9.  Table  9  displays  the 
effects  of  parameter  reduction  beginning  with  40  parameters  and  ending  with 
only  three.  In  addition  to  the  previously  used  scores  we  introduce  R,  the  ratio 
of  the  number  of  matrix  entries  below  the  diagonal  to  the  number  above  the 
diagonal.  This  ratio  provides  a  measure  of  the  asymmetry  of  the  forecast,  with 
values  greater  than  unity  indicating  overprediction,  and  values  less  than  unity 
indicating  underprediction. 

Table  9  clearly  illustrates  that  the  reduction  in  the  number  of  parameters 
has  a  small  but  unfavorable  effect  on  the  computer  forecasts.  We  may  regard 
the  tendencies  for  R  to  depart  further  from  unity,  for  Off  2  to  increase,  and  for 
U  to  decline,  as  evidence  for  progressively  worsening  forecasts.  These  three 
effects  are  most  noticeable  in  the  MVDA  forecast,  while  the  latter  effect  alone  is 
marginally  evident  in  MVDA/ CL. 


rhe  effects  of  the  parameter  reduction  are  offset  by  the  increase  in  the 
number  of  records  containing  all  or  most  of  the  parameters  submitted  for  anal¬ 
ysis  in  tile  reduced  sets.  This  improvement  in  representation  occurs  because 
in  the  reduction  steps  we  usually  eliminated  those  parameters  that  were  least 
significant;  i.e.,  those  chosen  least  often  in  the  subsets  of  tlie  previous  test; 
and,  generally,  the  lower  the  significance  of  a  parameter,  the  more  often  it  is 
missing  from  the  data  base.  It  is  concluded,  therefore,  that  the  decline  in  fore¬ 
cast  cjuality  in  Table  b  would  have  been  more  pronounced  had  all  parameters 
been  present  in  all  records.  This  proves  that  there  is  valuable  predictive  infor¬ 
mation  contained  in  at  least  some  of  the  less  significant  parameters.  It  is 
emphasized  that,  perhaps  to  a  large  degree,  the  lower  significance  of  these 
parameters  is  due  only  to  tlieir  frequent  absence  from  the  data  base, 

A  final  word  must  be  noted  regarding  the  combination  parameters.  Table  8 
indicates  that  a  number  of  these  new  parameters  have  been  selected  by  the  com¬ 
puter  program  as  significant  in  classifving  the  outcomes.  Due  to  the  complex 
intorcorrelations  among  various  parameters,  however,  in  addition  to  possible 
variance  stabilization  effects  and  other  statistical  phenomena,  we  do  not  fully 
understand  the  true  significance  of  these  comoination  parameters.  Questions 
such  as  tliis  probably  must  await  furtlier  testing  on  data  bases  containing  fewer 
missing  parameters. 

l.  j  I'esis  oil  a  Kiilly  Keprcwnti’il  Data  Itase 

'The  most  important  test  of  the  computer  forecast  is  aciiieved  in  the  case 
v/here  all  the  parameters  submitted  to  analysis  are  present  in  all  records  of  the 
data  base.  Such  a  test,  using  the  full  set  of  parameters,  is  impossible  with  the 
presently  available  data,  .A  test  can  ne  made  on  a  fully  represented  base,  how¬ 
ever,  if,  for  example,  only  eight  parameters  are  used,  and  we  are  willing  to 
accept  a  reduced  base  of  37:!^  records,  of  which  only  2233  remain  in  the  test  set. 
Such  a  test  (I)  was  performed,  and  the  results  are  shown  in  Tables  10,  11,  and  12. 

Test  I  shows  a  dramatic  improvement  in  the  MVDA/CD  computer  forecast 
in  all  scores,  while  the  .M\'UA  and  comparison  forecasts  show  smaller  improve¬ 
ments,  These  improvements  occur  despite  the  somew'hat  lower  flare  climatology 
that  applies  to  this  particular  test  set.  The  fact  that  the  comparison  (subjec¬ 
tive)  forecast  scores  are  higher  indicates  that  the  more  complete  observational 
coverage  during  this  sample  of  records  somehow  benefits  the  subjective  methods 
also. 

Due  to  the  reduced  number  of  records,  the  errors  associated  with  the  Test  I 
scores  are  about  50  percent  higher  than  those  stated  earlier.  Nevertheless,  there 
now  seems  no  question  that  the  MVDA/CL  forecast  is  superior  to  the  others. 


23 


Table  10.  Comparison  of  forecasts  I  sing  a  1  ully 
llepresented  Data  l{ase--Test  1  (loOO-record  training  set) 


Largest  Event 

Largest  Event  Observed 

Tot  al 

Forecasted 

No  flare 

C 

M  tt,  X 

fo  recasts 

COMPARISON 

No  Flare 

17s4 

00 

8 

18  52 

C 

193 

70 

24 

287 

M  &  X 

28 

31 

34 

93 

Total  Events 

1973 

191 

66 

2232 

MVDA 

No  Flare 

1707 

67 

10 

1784 

C 

232 

92 

24 

348 

M  &  X 

36 

32 

32 

100 

Total  Events 

1975 

191 

66 

2232 

MVDA/ CL 

No  Flare 

1829 

97 

14 

1940 

C 

145 

82 

33 

260 

M  &  X 

1 

12 

19 

32 

Total  Events 

1975 

191 

66 

Table  11.  Parameters  Submitted  to  Analysis  and 
Their  Frequency  of  Selection  in  9  Subsets--Test  I 


Flares  Today 

9 

Mag.  Class 

8 

Spot  Class  2 

5 

Bright  Pts. 

9 

Mag.  Grad. 

8 

Spot  Class  1 

1 

Spot  Class  3 

8 

Spot  Dynam. 

8 

24 


Faille  1_‘.  (dm 

paris^'M  ( 

if  forecii.'t 

Score-  -  -  I'e  .-t  1 

1  'orecaster 

pM'Ilt 

1 

1-;  A 

C  I  A  (Iff! 

(  Iff  8 

t:OMP.\HlS().N 

No  flare 

94*7 

91.8 

88. 

C 

24.4 

0  40.  .) 

8.8  fi:-;.8  1X1 

1.8 

M  IX  X 

■iG.n 

■.1.4  44.1 

8.  0 

.\I\U.\ 

Nci  I'lare 

9;').  7 

88.4  91,1 

88.  4 

( ' 

2fi.  4 

18.8  .■17..; 

8.8  48.8  88. 'J  1  a  “ 

2,1 

M  .1,  X 

.:2. 9 

10.8 

8.  0 

\l\  D,\/('l. 

No  flare 

''4 .  4 

08.8 

88.  4 

(.' 

-.1.  > 

48.9  :;7,8 

8.8  :i8.:-;  88.  4  18.  ;j 

0.  7 

M  x 

Jh.d  44.1 

.4.0 

:>.  CONCI.I  SION.'  \M)  KKCOMMIMlMIONS 

1  CDUcluaiuii.s  oi  thi.T  study  may  i)P  .-lumniarizpci  as  I'ollow.s; 
i.  Tilt'  .staiKiard  .\1\  1).\  t'orocast  is  vorv  .similar  to  tho  comparison  lorccast 
usiHi  in  tiiis  stu<,i\  in  terms  ol'  overall  accuracy  and  liias  tuivard  overprediction. 

I'lii’  M\  l).\/('l..  t'ureoast  is  superior  overall  to  either  tlie  \1\’I).\  or  the 
coi:;parison  foreca.'t,  and  is  biased  toward  tinderpredictioii. 

riie  ontimum  sire  for  the  training  set  is  probably  aiiout  l.MIO  records 
for  i!ie  climatoli  ries  that  prevailed  diirine  1977  and  1078. 

i.  I■'lnr('.s  I'odav  is  tlie  most  valtialtle  prediction  parameter  in  the  data 
ease  iLsed  here,  witti  the  llricht  l-’oints'  parameter  a  verv  close  second. 

( itlier  important  iiarameters  are  Macnetic  Class,  Alapnetic  Cradient,  Spot 
(da.'.s,  .and  Sun.-^pot  Dynamics. 

Coniiiination  parameters,  althoueh  their  role  is  not  fully  understood, 
seem  to  improvi'  forecast  scores. 

G.  Some  of  the  often  missing  parameters  (which  probably,  tlierefore,  only 
appear  to  be  less  significant  as  predictors)  contain  valuable  predictive  informa¬ 
tion.  Probable  candidates  include  Radio  Burst/ Sweep,  Neutral  l.ine  Changes, 
Neutral  I-ine  C’omplexity,  and  "Emerging  l-'lu-x. 

The  MVD.\/CL  procedure  may  be  capable  of  producing  forecasts  superior 
to  any  presently  available  using  conventional,  subjective  techniques.  It  has  been 
shown  that  its  skill  becomes  markedly  evident  when  complete  parameter  repre- 
•sentation  is  achieved  in  the  data  ba.se.  t)n  the  ba.sis  of  this,  we  predict  that  with 


improvements  in  data  consistency,  as  well  as  the  inclusion  of  new,  objective 
parameters  in  the  future,  the  computer  forecast  scores  will  continue  to  improve. 


This  study  has  led  us  to  make  the  following  recommendations  concerning 
the  use  of  the  two  computer  forecasts: 

1.  Provide  a  flare  forecast  derived  from  MVDA/CL  for  those  customers 
who  cannot  tolerate  false  flare  alarms  (note  the  comparison  of  F  scores  in 
Table  12). 

2.  Provide  a  flare  forecast  derived  from  standard  MVDA  for  those  cus¬ 
tomers  who  need  to  be  forewarned  of  flares  as  often  as  possible  (compare  E 
scores  in  Table  12). 

3.  Improve  the  coverage  for  the  parameters  in  Table  1  that  are  deemed 
less  significant’  by  virtue  of  their  frequent  absence  in  the  data  base. 

4.  Improve  the  objectivity  and  consistency  of  all  parameters. 
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